9 Hardware Reliability

نویسندگان

  • Irene Eusgeld
  • Bernhard Fechner
  • Felix Salfner
  • Max Walter
  • Philipp Limbourg
  • Lijun Zhang
چکیده

In the IT field the term “fault tolerance” is often widely used as “reliability improvement”. The question to be clarified is the relationship between reliability and fault tolerance. In a general sense reliability will be understood as ability of a component/system to function correctly over a specified period of time, mostly under predefined conditions. Fault tolerance is defined as the ability of the system to continue operation in the event of a failure. Fault tolerance means that a computer system or component is designed such that, in case a component fails, a backup component or backup procedure can immediately take its place with no loss of functionality. Reliability can be improved through fault tolerance. Metrics of “classical” reliability theory are well known and numerous. Metrics of fault tolerance are less common, e.g. number of tolerated faults, number of checkpoints, reconfiguration time, etc. The most important method supporting fault tolerance/reliability is redundancy. Redundancy is duplication of components or repetition of operations to provide alternative functional channels in case of failure. Redundancy can be implemented in different ways: structural (hot and standby redundancy), temporal, functional, etc. Application of redundancy is always connected with an increase in cost and/or complexity as well as sometimes with synchronisation problems. Predicting the system reliability by modelling during the design phase, and measuring the parameters of a real system are two completely different approaches. This chapter is sub-divided into five sections depending on the primary goal of the readers. The sections of this chapter are presented as set of references structured according to the various reliability metrics (RM). An index is provided at the end of the book so that specific issues can be referenced directly. The chapter is organised as follows:

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Microprocessor-Based Hybrid Duplex Fault-Tolerant System

Reliability is one of the fundamental considerations in the design of industrial control equipment. The microprocessor-based Hybrid Duplex fault-tolerant System (HDS) proposed in this paper has high reliability to meet this demand although its hardware structure is simple. The hardware configuration of HDS and the fault tolerance of this system are described. The switching control strategies in...

متن کامل

Security-aware register placement to hinder malicious hardware updating and improve Trojan detectability

Nowadays, bulk of the designers prefer to outsource some parts of their design and fabrication process to the third-part companies due to the reliability problems, manufacturing cost and time-to-market limitations. In this situation, there are a lot of opportunities for malicious alterations by the off-shore companies. In this paper, we proposed a new placement algorithm that hinders the hardwa...

متن کامل

Reliability improvement of the DCS Board used in the ALICE Experiment

The DCS Board is an Embedded System that is part of the ALICE Detector Control System and employed as interface layer between detector hardware and control environment. One hardware design is used in multiple applications and adapted based on a combination of a dedicated processor and configurable hardware to special detector needs. 1040 pieces of the DCS Board are produced for 9 detectors and ...

متن کامل

Hierarchical Approach to Speciication and Veriication of Fault-tolerant Operating Systems

The goal of formal methods research in the Systems Validation Methods Branch (SVMB) at NASA Langley Research Center (LaRC) is the development of design and veriication methodologies to support the development of provably correct system designs for life-critical control applications. Speciically, our eeorts are directed at formal speciication and veriication of the most critical hardware and sof...

متن کامل

A quantitative software testing method for hardware and software integrated systems in safety critical applications

Most of today’s Safety Instrumented Systems (SIS) are hardware and software integrated systems. In these systems, failures can occur in both hardware and software. Hardware failures and their effects have been studied extensively in the literature. However, the methods and results dealing with hardware failure are not directly applicable for software reliability modeling, due to the difference ...

متن کامل

A Logical Structure based Reliability Evaluation Model for Information Systems and its Application

In order to evaluate the reliability of information system with hybrid structure, a logical structure based reliability evaluation model for information systems is proposed for assessing hardware and software reliability for information system, in which the hybrid structure is used to architecture the reliability model, and the reliability value is used to quantitatively describe the reliabilit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008